Skip to content

Optimize/streamline fill operations #2395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 12, 2025
Merged

Conversation

ncruces
Copy link
Contributor

@ncruces ncruces commented Apr 11, 2025

This optimizes/streamlines fill operations.

Backstory:

  • since 1.21 we have a clear builtin which is faster than copy when zeroing memory
  • slices.Repeat uses a copy loop very similar to the one we're using
  • bytes.Repeat does the same, but adds an 8KB maximum to the chunk size
  • the existing generated code could be streamlined

Intuition:

  • rare that tables are much larger than 8KB
  • using clear (actually runtime.memclrNoHeapPointers) from native code is harder
  • favor smaller/simpler code for tables, faster code for memory (like the standard library)

For the compiler:

  • streamline/simplify table.fill
  • optimize memory.fill using the 8KB maximum chunk size

For the interpreter:

  • keep table.fill unchanged
  • optimize memory.fill with both clear and maximum chunk size

For table.Grow:

  • recognize that zero initialization is already done

Results:

I could measure an over 2x improvement filling megabytes of memory, with no degradation in performance at small sizes. No increase of generated code size.

Future work:

Using memory.fill to zero memory is very common. If we could access runtime.memclrNoHeapPointers in the compiler, we could potentially gain another 25% there (especially if we optimized this when zero is known at compile time).

ncruces added 2 commits April 11, 2025 14:01
Signed-off-by: Nuno Cruces <ncruces@users.noreply.github.com>
Signed-off-by: Nuno Cruces <ncruces@users.noreply.github.com>
@ncruces ncruces requested a review from mathetake as a code owner April 11, 2025 13:28
Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic

@mathetake mathetake merged commit 242ae91 into tetratelabs:main Apr 12, 2025
50 checks passed
@ncruces
Copy link
Contributor Author

ncruces commented Apr 12, 2025

That was way faster than I expected. 😂
I kinda expected this to be torn to pieces.

It has since occured to me that ((i - 1) & 8191) + 1 might not be better than min(i, 8192). Wdyt?

I know very little of the back ends to know what generates better code. “Branches are bad, conditional moves not so much” is the advanced-ness of my understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants